Bolivar Department
OxEnsemble: Fair Ensembles for Low-Data Classification
Rystrøm, Jonathan, Fu, Zihao, Russell, Chris
We address the problem of fair classification in settings where data is scarce and unbalanced across demographic groups. Such low-data regimes are common in domains like medical imaging, where false negatives can have fatal consequences. We propose a novel approach \emph{OxEnsemble} for efficiently training ensembles and enforcing fairness in these low-data regimes. Unlike other approaches, we aggregate predictions across ensemble members, each trained to satisfy fairness constraints. By construction, \emph{OxEnsemble} is both data-efficient, carefully reusing held-out data to enforce fairness reliably, and compute-efficient, requiring little more compute than used to fine-tune or evaluate an existing model. We validate this approach with new theoretical guarantees. Experimentally, our approach yields more consistent outcomes and stronger fairness-accuracy trade-offs than existing methods across multiple challenging medical imaging classification datasets.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Switzerland (0.04)
- (9 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
Vieting, Peter, Kannen, Maximilian, Hilmes, Benedikt, Schlüter, Ralf, Ney, Hermann
Neural front-ends are an appealing alternative to traditional, fixed feature extraction pipelines for automatic speech recognition (ASR) systems since they can be directly trained to fit the acoustic model. However, their performance often falls short compared to classical methods, which we show is largely due to their increased susceptibility to overfitting. This work therefore investigates regularization methods for training ASR models with learnable feature extraction front-ends. First, we examine audio perturbation methods and show that larger relative improvements can be obtained for learnable features. Additionally, we identify two limitations in the standard use of SpecAugment for these front-ends and propose masking in the short time Fourier transform (STFT)-domain as a simple but effective modification to address these challenges. Finally, integrating both regularization approaches effectively closes the performance gap between traditional and learnable features.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- (20 more...)
Unified Learnable 2D Convolutional Feature Extraction for ASR
Vieting, Peter, Hilmes, Benedikt, Schlüter, Ralf, Ney, Hermann
Neural front-ends represent a promising approach to feature extraction for automatic speech recognition (ASR) systems as they enable to learn specifically tailored features for different tasks. Yet, many of the existing techniques remain heavily influenced by classical methods. While this inductive bias may ease the system design, our work aims to develop a more generic front-end for feature extraction. Furthermore, we seek to unify the front-end architecture contrasting with existing approaches that apply a composition of several layer topologies originating from different sources. The experiments systematically show how to reduce the influence of existing techniques to achieve a generic front-end. The resulting 2D convolutional front-end is parameter-efficient and suitable for a scenario with limited computational resources unlike large models pre-trained on unlabeled audio. The results demonstrate that this generic unified approach is not only feasible but also matches the performance of existing supervised learnable feature extractors.
- Europe > Austria > Vienna (0.14)
- Oceania > Australia > Queensland > Brisbane (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (13 more...)
Using LLMs to create analytical datasets: A case study of reconstructing the historical memory of Colombia
Anderson, David, Benitez, Galia, Bjarnadottir, Margret, Reyya, Shriyan
Colombia has been submerged in decades of armed conflict, yet until recently, the systematic documentation of violence was not a priority for the Colombian government. This has resulted in a lack of publicly available conflict information and, consequently, a lack of historical accounts. This study contributes to Colombia's historical memory by utilizing GPT, a large language model (LLM), to read and answer questions about over 200,000 violence-related newspaper articles in Spanish. We use the resulting dataset to conduct both descriptive analysis and a study of the relationship between violence and the eradication of coca crops, offering an example of policy analyses that such data can support. Our study demonstrates how LLMs have opened new research opportunities by enabling examinations of large text corpora at a previously infeasible depth.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- South America > Colombia > Bolivar Department (0.04)
- South America > Colombia > Southwest Colombia (0.04)
- (7 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Government > Military (1.00)
- Government > Regional Government > South America Government > Colombia Government (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Masked Autoencoders for Ultrasound Signals: Robust Representation Learning for Downstream Applications
Roßteutscher, Immanuel, Drese, Klaus S., Uphues, Thorsten
We investigated the adaptation and performance of Masked Autoencoders (MAEs) with Vision Transformer (ViT) architectures for self-supervised representation learning on one-dimensional (1D) ultrasound signals. Although MAEs have demonstrated significant success in computer vision and other domains, their use for 1D signal analysis, especially for raw ultrasound data, remains largely unexplored. Ultrasound signals are vital in industrial applications such as non-destructive testing (NDT) and structural health monitoring (SHM), where labeled data are often scarce and signal processing is highly task-specific. We propose an approach that leverages MAE to pre-train on unlabeled synthetic ultrasound signals, enabling the model to learn robust representations that enhance performance in downstream tasks, such as time-of-flight (ToF) classification. This study systematically investigated the impact of model size, patch size, and masking ratio on pre-training efficiency and downstream accuracy. Our results show that pre-trained models significantly outperform models trained from scratch and strong convolutional neural network (CNN) baselines optimized for the downstream task. Additionally, pre-training on synthetic data demonstrates superior transferability to real-world measured signals compared with training solely on limited real datasets. This study underscores the potential of MAEs for advancing ultrasound signal analysis through scalable, self-supervised learning.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Switzerland (0.04)
- (15 more...)
- Health & Medicine > Diagnostic Medicine (0.46)
- Health & Medicine > Consumer Health (0.34)
Dynamic Acoustic Model Architecture Optimization in Training for ASR
Xu, Jingjing, Yang, Zijian, Zeyer, Albert, Beck, Eugen, Schlueter, Ralf, Ney, Hermann
Architecture design is inherently complex. Existing approaches rely on either handcrafted rules, which demand extensive empirical expertise, or automated methods like neural architecture search, which are computationally intensive. In this paper, we introduce DMAO, an architecture optimization framework that employs a grow-and-drop strategy to automatically reallocate parameters during training. This reallocation shifts resources from less-utilized areas to those parts of the model where they are most beneficial. Notably, DMAO only introduces negligible training overhead at a given model complexity. We evaluate DMAO through experiments with CTC on LibriSpeech, TED-LIUM-v2 and Switchboard datasets. The results show that, using the same amount of training resources, our proposed DMAO consistently improves WER by up to 6% relatively across various architectures, model sizes, and datasets. Furthermore, we analyze the pattern of parameter redistribution and uncover insightful findings.
- Asia > South Korea > Incheon > Incheon (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (19 more...)
FinRobot: Generative Business Process AI Agents for Enterprise Resource Planning in Finance
Yang, Hongyang, Lin, Likun, She, Yang, Liao, Xinyu, Wang, Jiaoyang, Zhang, Runjia, Mo, Yuquan, Wang, Christina Dan
Enterprise Resource Planning (ERP) systems serve as the digital backbone of modern financial institutions, yet they continue to rely on static, rule-based workflows that limit adaptability, scalability, and intelligence. As business operations grow more complex and data-rich, conventional ERP platforms struggle to integrate structured and unstructured data in real time and to accommodate dynamic, cross-functional workflows. In this paper, we present the first AI-native, agent-based framework for ERP systems, introducing a novel architecture of Generative Business Process AI Agents (GBPAs) that bring autonomy, reasoning, and dynamic optimization to enterprise workflows. The proposed system integrates generative AI with business process modeling and multi-agent orchestration, enabling end-to-end automation of complex tasks such as budget planning, financial reporting, and wire transfer processing. Unlike traditional workflow engines, GBPAs interpret user intent, synthesize workflows in real time, and coordinate specialized sub-agents for modular task execution. We validate the framework through case studies in bank wire transfers and employee reimbursements, two representative financial workflows with distinct complexity and data modalities. Results show that GBPAs achieve up to 40% reduction in processing time, 94% drop in error rate, and improved regulatory compliance by enabling parallelism, risk control insertion, and semantic reasoning. These findings highlight the potential of GBPAs to bridge the gap between generative AI capabilities and enterprise-grade automation, laying the groundwork for the next generation of intelligent ERP systems.
- South America > Colombia > Bolivar Department > Cartagena (0.04)
- North America > United States > Hawaii (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Enterprise Applications > Enterprise Resource Planning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)
Mutual Evidential Deep Learning for Medical Image Segmentation
He, Yuanpeng, Bi, Yali, Li, Lijian, Pun, Chi-Man, Jiao, Wenpin, Jin, Zhi
Existing semi-supervised medical segmentation co-learning frameworks have realized that model performance can be diminished by the biases in model recognition caused by low-quality pseudo-labels. Due to the averaging nature of their pseudo-label integration strategy, they fail to explore the reliability of pseudo-labels from different sources. In this paper, we propose a mutual evidential deep learning (MEDL) framework that offers a potentially viable solution for pseudo-label generation in semi-supervised learning from two perspectives. First, we introduce networks with different architectures to generate complementary evidence for unlabeled samples and adopt an improved class-aware evidential fusion to guide the confident synthesis of evidential predictions sourced from diverse architectural networks. Second, utilizing the uncertainty in the fused evidence, we design an asymptotic Fisher information-based evidential learning strategy. This strategy enables the model to initially focus on unlabeled samples with more reliable pseudo-labels, gradually shifting attention to samples with lower-quality pseudo-labels while avoiding over-penalization of mislabeled classes in high data uncertainty samples. Additionally, for labeled data, we continue to adopt an uncertainty-driven asymptotic learning strategy, gradually guiding the model to focus on challenging voxels. Extensive experiments on five mainstream datasets have demonstrated that MEDL achieves state-of-the-art performance.
- Asia > Macao (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- (10 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
AI Alignment in Medical Imaging: Unveiling Hidden Biases Through Counterfactual Analysis
Ma, Haroui, Quinzan, Francesco, Willem, Theresa, Bauer, Stefan
Machine learning (ML) systems for medical imaging have demonstrated remarkable diagnostic capabilities, but their susceptibility to biases poses significant risks, since biases may negatively impact generalization performance. In this paper, we introduce a novel statistical framework to evaluate the dependency of medical imaging ML models on sensitive attributes, such as demographics. Our method leverages the concept of counterfactual invariance, measuring the extent to which a model's predictions remain unchanged under hypothetical changes to sensitive attributes. We present a practical algorithm that combines conditional latent diffusion models with statistical hypothesis testing to identify and quantify such biases without requiring direct access to counterfactual data. Through experiments on synthetic datasets and large-scale real-world medical imaging datasets, including \textsc{cheXpert} and MIMIC-CXR, we demonstrate that our approach aligns closely with counterfactual fairness principles and outperforms standard baselines. This work provides a robust tool to ensure that ML diagnostic systems generalize well, e.g., across demographic groups, offering a critical step towards AI safety in healthcare. Code: https://github.com/Neferpitou3871/AI-Alignment-Medical-Imaging.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (12 more...)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation
Ni, Tongke, Fan, Yang, Zhou, Junru, Wu, Xiangping, Chen, Qingcai
Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- South America > Colombia > Bolivar Department > Cartagena (0.04)
- North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)